87 research outputs found

    Sharing Semantic Resources

    Get PDF
    The Semantic Web is an extension of the current Web in which information, so far created for human consumption, becomes machine readable, “enabling computers and people to work in cooperation”. To turn into reality this vision several challenges are still open among which the most important is to share meaning formally represented with ontologies or more generally with semantic resources. This Semantic Web long-term goal has many convergences with the activities in the field of Human Language Technology and in particular in the development of Natural Language Processing applications where there is a great need of multilingual lexical resources. For instance, one of the most important lexical resources, WordNet, is also commonly regarded and used as an ontology. Nowadays, another important phenomenon is represented by the explosion of social collaboration, and Wikipedia, the largest encyclopedia in the world, is object of research as an up to date omni comprehensive semantic resource. The main topic of this thesis is the management and exploitation of semantic resources in a collaborative way, trying to use the already available resources as Wikipedia and Wordnet. This work presents a general environment able to turn into reality the vision of shared and distributed semantic resources and describes a distributed three-layer architecture to enable a rapid prototyping of cooperative applications for developing semantic resources

    Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter

    Full text link
    Microblogs are increasingly exploited for predicting prices and traded volumes of stocks in financial markets. However, it has been demonstrated that much of the content shared in microblogging platforms is created and publicized by bots and spammers. Yet, the presence (or lack thereof) and the impact of fake stock microblogs has never systematically been investigated before. Here, we study 9M tweets related to stocks of the 5 main financial markets in the US. By comparing tweets with financial data from Google Finance, we highlight important characteristics of Twitter stock microblogs. More importantly, we uncover a malicious practice - referred to as cashtag piggybacking - perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones. Among the findings of our study is that as much as 71% of the authors of suspicious financial tweets are classified as bots by a state-of-the-art spambot detection algorithm. Furthermore, 37% of them were suspended by Twitter a few months after our investigation. Our results call for the adoption of spam and bot detection techniques in all studies and applications that exploit user-generated content for predicting the stock market

    Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

    Full text link
    Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics

    The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset

    Full text link
    The discourse around conspiracy theories is currently thriving amidst the rampant misinformation prevalent in online environments. Research in this field has been focused on detecting conspiracy theories on social media, often relying on limited datasets. In this study, we present a novel methodology for constructing a Twitter dataset that encompasses accounts engaged in conspiracy-related activities throughout the year 2022. Our approach centers on data collection that is independent of specific conspiracy theories and information operations. Additionally, our dataset includes a control group comprising randomly selected users who can be fairly compared to the individuals involved in conspiracy activities. This comprehensive collection effort yielded a total of 15K accounts and 37M tweets extracted from their timelines. We conduct a comparative analysis of the two groups across three dimensions: topics, profiles, and behavioral characteristics. The results indicate that conspiracy and control users exhibit similarity in terms of their profile metadata characteristics. However, they diverge significantly in terms of behavior and activity, particularly regarding the discussed topics, the terminology used, and their stance on trending subjects. Interestingly, there is no significant disparity in the presence of bot users between the two groups, suggesting that conspiracy and automation are orthogonal concepts. Finally, we develop a classifier to identify conspiracy users using 93 features, some of which are commonly employed in literature for troll identification. The results demonstrate a high accuracy level (with an average F1 score of 0.98%), enabling us to uncover the most discriminative features associated with conspiracy-related accounts

    The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

    Full text link
    Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel phenomenon on Twitter and we provide quantitative evidence that a paradigm-shift exists in spambot design. First, we measure current Twitter's capabilities of detecting the new social spambots. Later, we assess the human performance in discriminating between genuine accounts, social spambots, and traditional spambots. Then, we benchmark several state-of-the-art techniques proposed by the academic literature. Results show that neither Twitter, nor humans, nor cutting-edge applications are currently capable of accurately detecting the new social spambots. Our results call for new approaches capable of turning the tide in the fight against this raising phenomenon. We conclude by reviewing the latest literature on spambots detection and we highlight an emerging common research trend based on the analysis of collective behaviors. Insights derived from both our extensive experimental campaign and survey shed light on the most promising directions of research and lay the foundations for the arms race against the novel social spambots. Finally, to foster research on this novel phenomenon, we make publicly available to the scientific community all the datasets used in this study.Comment: To appear in Proc. 26th WWW, 2017, Companion Volume (Web Science Track, Perth, Australia, 3-7 April, 2017

    DNA-inspired online behavioral modeling and its application to spambot detection

    Get PDF
    We propose a strikingly novel, simple, and effective approach to model online user behavior: we extract and analyze digital DNA sequences from user online actions and we use Twitter as a benchmark to test our proposal. We obtain an incisive and compact DNA-inspired characterization of user actions. Then, we apply standard DNA analysis techniques to discriminate between genuine and spambot accounts on Twitter. An experimental campaign supports our proposal, showing its effectiveness and viability. To the best of our knowledge, we are the first ones to identify and adapt DNA-inspired techniques to online user behavioral modeling. While Twitter spambot detection is a specific use case on a specific social media, our proposed methodology is platform and technology agnostic, hence paving the way for diverse behavioral characterization tasks

    Fame for sale: efficient detection of fake Twitter followers

    Get PDF
    Fake followers\textit{Fake followers} are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere - hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel Class A\textit{Class A} classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier. The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers

    Modularity-based approach for tracking communities in dynamic social networks

    Full text link
    Community detection is a crucial task to unravel the intricate dynamics of online social networks. The emergence of these networks has dramatically increased the volume and speed of interactions among users, presenting researchers with unprecedented opportunities to explore and analyze the underlying structure of social communities. Despite a growing interest in tracking the evolution of groups of users in real-world social networks, the predominant focus of community detection efforts has been on communities within static networks. In this paper, we introduce a novel framework for tracking communities over time in a dynamic network, where a series of significant events is identified for each community. Our framework adopts a modularity-based strategy and does not require a predefined threshold, leading to a more accurate and robust tracking of dynamic communities. We validated the efficacy of our framework through extensive experiments on synthetic networks featuring embedded events. The results indicate that our framework can outperform the state-of-the-art methods. Furthermore, we utilized the proposed approach on a Twitter network comprising over 60,000 users and 5 million tweets throughout 2020, showcasing its potential in identifying dynamic communities in real-world scenarios. The proposed framework can be applied to different social networks and provides a valuable tool to gain deeper insights into the evolution of communities in dynamic social networks

    Signed Web Forms

    Get PDF
    As more and more Web applications are available on the Internet, they are becoming a standard way also for many organizations and institutions to offer their services and/or improve the efficiency of office procedures. Some of these applications require the user to input some information, typically by filling out a form, and submit the data. In many cases the user is required to digitally sign the data submitted. The problem of the digital signature has been solved with appropriate algorithms based on the use of two different keys: the private key and the public key. The private key must be known only to its legitimate owner, certified by a Certification Authority, and must be protected from unauthorized access. This problem has been solved by means of smart-cards and USB-tokens. However when the user decides to sign a document displayed on the screen, the software actually uses his private key to sign an internal representation of the document. Thus, another problem arises: the user must be sure that the document actually signed is the same document he has been shown. Since few years the WYSIWYS (What You See Is What You Sign) technology has been suggested, so that users know exactly what they sign. We propose an architecture based on this technology. The signing module is embedded in a Web Service that must be invoked to obtain the digital signature of a given document. This Web Service shows the document to the user that decides whether to sign it or not. Finally, we have tested this architecture by implementing a prototype of a Form-based Web application

    Descrizione e gestione di workflow documentali con una appplicazione basata su XML

    Get PDF
    Abstract available in italian onlyI sistemi di workflow coordinano tutte le operazioni che riguardano l\u27elaborazione e la trasmissione dei documenti, specificando le attivit? ed i ruoli di tutti gli appartenenti al processo di lavoro. Un document workflow segue un documento durante tutto il suo ciclo di vita, fornendo un\u27azione di controllo costante per la sua compilazione. Nello studio presentato si cerca di far luce sulle varie problematiche che sorgono quando si descrivono iter documentali. A tale proposito viene definito un modello concettuale che permette di descrivere in maniera dettagliata un iter documentale e tutte le attivit? che si possono effettuare sul documento. Per sviluppare il modello si ? scelto di adottare la tecnologia XML, sia per strutturare i documenti che tutte le informazioni relative al flusso. Come agente si ? intesa una qualsiasi entit?, sia umana che software, capace di interagire con il documento, mentre con flusso di documenti si ? inteso tutti i possibili percorsi che il documento stesso segue nel suo ciclo di vita, passando da un agente all\u27altro. Il flusso documentale viene descritto tramite un linguaggio dichiarativo attraverso l\u27elencazione di tutti gli agenti che partecipano al flusso, specificando tutte le operazioni che ogni agente pu? svolgere sull\u27istanza del documento. I documenti elaborati dai vari agenti hanno una struttura definita da uno schema XML e sono accompagnati per tutto il loro ciclo di vita da altri documenti, che contengono informazioni sul flusso, sui vincoli e sulla visualizzazione dei dati. Una particolare enfasi ? data ai problemi relativi alla fusione di due o pi? documenti compilati da molteplici agenti in maniera concorrente. Per quanto concerne la progettazione di un sistema di gestione di workflow documentali, sono due le soluzioni architetturali analizzate: quella centralizzata e quella distribuita. Al fine di rappresentare graficamente i documenti da elaborare si utilizza il browser XSmiles, in grado di visualizzare documenti Xhtml con all\u27interno moduli XForms. Adoperando tecniche innovative, come XML-Signature, sono stati presi in esame tutti gli aspetti legati alla firma dei documenti modificati dagli agenti
    • …
    corecore